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Abstract. In citation networks, the activity of papers usually decreases with age 
and dormant papers may be discovered and become fashionable again. To model this 
phenomenon, a competition mechanism is suggested which incorporates two factors: 
vigorousness and dormancy. Based on this idea, a citation network model is proposed, 
in which a node has two discrete stage: vigorous and dormant. Vigorous nodes can 
be deactivated and dormant nodes may be activated and become vigorous. The 
evolution of the network couples addition of new nodes and state transitions of old 
ones. Both analytical calculation and numerical simulation show that the degree 
distribution of nodes in generated networks displays a good right-skewed behaviour. 
Particularly, scale-free networks are obtained as the deactivated vertex is target 
selected and exponential networks are realized for the random-selected case. Moreover, 
the measurement of four real-world citation networks achieves a good agreement with 
the stochastic model. 
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1. Introduction 

The citation patterns of scientific publications can be simplified into a citation network 
with nodes representing scientific articles published in journals and edges mimicking 
citations from one article to another published previously [TJ. Citation networks are 
valuable to uncover the dynamics of scientific publications and have been studied for 
a long time [2]. A particularly noteworthy contribution was a 1965 study by de Solla 
Price [3], who proposed the so-called "cumulative advantage" mechanism, that is, a 
paper which has been cited many times is more likely to be cited again than one 
which has been little cited. The cumulative advantage is based on the idea of "rich get 
richer" suggested by Yule [I] and Simon [5j, and the criterion now is widely known as the 
"preferential attachment" in the framework of currently fashionable evolving network 
models, proposed by Barabasi and ALbert in 1999 [6J. By employing growth and 
preference, the Barabasi-Albert (BA) model provides a natural explanation for the scale- 
free behavior observed in many realistic systems. Recently, Clauset et al. [7] proposed a 
statistical framework for determining power-law tails of various data sets, in accordance 
with the conclusion of Redner jS]. 

In the study of citation networks, one of the most important topics is the 
characterization of the probability distribution of the number of citations received 
by a paper and the design of simple microscopic models to reproduce the real-world 
distribution [9]. Many empirical studies in citation networks have proved that age may 
be one of the most important mechanisms that determines the statistical properties of 
the growing network [101 HH H21 [131 HH HS1 US HZl HH [191 1201 [21] . To investigate the 
effect of age on network evolution, the BA model has been modified by incorporating 
time dependence in citation networks. Dorogovtsev and Mendes DM00 studied the case 
that the probability of an old node attached by a newcomer is not only proportional to 
its degree k but also to a power of its age r~ a (where r is the age of a node). They found 
that the resulting network shows scale-free (SF) behavior only in the region a < 1. For 
a > 1, the degree distribution P(k) is exponential. One the other hand, Klemm and 
Eguiluz [llj proposed a degree- dependent deactivation network model, which is highly 
clustered and retains the power-law distribution of the node's degree. 

Most previous studies only consider the irreversible impact of age, such as gradual 
aging [TO] and absolute deactivation [TTj . In the real world, however, there is a universal 
phenomenon called "delayed recognition", that is, papers did not seem to achieve any 
sort of recognition until some years after their original publication [221 [23]. The question 
therefore arises as to whether such process can be explained or expected by the network 
theory. In this paper we express the notion of the delayed recognition in terms of 
an evolving network model with transitions of nodes' states to answer this question. 
Intuitively, we suggest that the activity of a node is the result of the competition of two 
factors: vigorousness and dormancy. For vigorousness, supposing that a new published 
paper or an old paper, its ability of receiving citations from others increases gradually 
with time. Whereas for dormancy, it describes the deactivation of the paper and being 
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slept. The evolution of the network couples addition of new nodes and state transitions 
of old ones. It is found that the degree distribution of the resulting network depends 
on the transition probability. Furthermore, we study four real-world citation data and 
notice the good agreement with present model. 

2. Model 

The evolution process starts with an initial network of a small number tuq of isolated 
nodes, in which m (m < m ) nodes are vigorous. Motivated by previous research 
[TT1 120] . at each time step the dynamics runs as follows. 

(i) Adding a new node i with m outgoing links that are attached to previously 
existing m vigorous nodes. We assume that m is the average number of references per 
article. By k we denote the in-degree of a node, i.e., the number of edges pointing to it. 
The in-degree of the newcomer is k\ = at first. Each selected vigorous node j receives 
exactly one incoming edge, thereby fc'- — > fc,- + 1. Since the out-degree of each node is 
m always, the total degree of a node is k = k + m. 

(ii) Activating the new node i, which means that the new published paper is always 
assumed to be vigorous at first. 

(iii) Awakening one of the previously existing dormant nodes. For simplicity, we 
assume that each dormant node is chosen uniformly to be activated. 

(iv) Deactivating two of the vigorous nodes. The probability of a vigorous node j 
being deactivated is given by 

where a > is a preferential factor reflecting the initial attractiveness of different fields, 
and the normalization factor is defined as 7 — 1 = [J2ieA + k{)]~ 1 . The summation 
runs over the set A of the currently vigorous nodes. Eq. ([1]) means that the most cited 
paper is less possibility to be forgotten. 

According the model definition, vigorous nodes may become dormant ones 
gradually, which can be explained as a collective "forgetting". At the same time, 
dormant nodes may be awaked and receive links from subsequent node again, which 
considers the recognition of "forgotten" papers. 

3. Degree distribution 

Denoting A\, the number of vigorous nodes with in-degree k! at time t, one can write 
out the differential equation 

^|±i = (i-2*40(4<+/4<)-4' + i = (1 - 2^7) (4'+/40-4' +1 (2) 

for network evolution, where /ijy is the activation probability. Imposing the stationary 
condition dA\,/dt = 0, one obtains 
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The probability of a dormant node being activated is assumed to be uniform, so 
takes the form 



/4' 



— , (4) 
m + t — m — 2 ' 

where Ny represents the number of dormant nodes with in-degree k' at time t. For 
large t, the total number of nodes in the network is approximately equal to the number 
of dormant nodes, and the overall in-degree distribution ny can be approximated by 
considering the dormant nodes only. Thus, we obtain the relationship 



/4' 



ny 



and ny can be calculated as the rate of the change of vigorous nodes Ay 

ny — Ay — A k > +1 . 
Substituting Eqs. (j5J) and <Q into Eq. (j3J) yields 



Ay = A [] - 

l= o z + a + 1 ~ 7 



v4 exp 



1-7 



i=0 



i + a + 1 — 7' 



(5) 
(6) 

(7) 



where the boundary value A is equal to 1 reflecting the constant addition of newcomers 
with initial k' — 0. In the following, we give analytical solutions corresponding to 
different a. 

(i) The case of samll a and a > m. By the approximately logarithmic Taylor 
expansion, Eq. ((7j) can be written as 

Ay = (a + 1 - if~\k' + a + 1 - 7)~ (7_1) , (8) 

and the overall in-degree distribution ny is 

dAy 



Uy 



dk' 



c(k' + a + 1 - 7)" 



(9) 



The normalized factor is c = (7 — l)(a + 1 — 7) 7_1 . The exponent 7 can be obtained 
from a self-consistency condition m = J °° k'nydk', which gives 

m + a 



7=1 + 



(10) 



m + 1 

Thus, the exponent 7 depends on the parameters a and m. If it is set a = m + 2, then 
one has 7 = 3. Figure [TJ shows the total degree distribution obtained by simulating the 
model for 10 5 time steps. As expected, we obtain power-law distributions with best- 
fitted exponent 7 equal to 2.82(9), 2.92(9), and 2.96(5), corresponding to m — 10, 20, 
and 40, respectively. 

(ii) The case of a — > 00. The deactivation probability vy is independent of k', 
which means that each of the m + 2 vigorous nodes will be deactivated with the same 
probability l/(m + 2). Thus, Eq. <£T§ can be written as 



Ay 



exp 



fc'ln 



a + 2 - 2 7 
a + 1 — 7 



m 



m 
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Figure 1. (Color online) Degree distributions of nodes of generated networks in case 
of a = m + 2 for rn = 10 (square), 20 (circle) and 40 (triangle), respectively. The 
size of networks is N — 10 5 . The solid lines are least-squares fits based on the form of 
Eq. ®. 



Then, the overall in-degree distribution ny is 

dAi./ , (m + 1\ / m 



dk' V m S \m + 1 
To obtain the total degree distribution, we rewrite the above equation as 

n fc = ln ) ( — — t) , (13) 

V m / \m + 1/ 

where k = k' + m. Thus, the distribution is exponent decay. In Fig. |2] we plot the 
total degree distribution of the simulated networks for m = 10, 20, and 40, respectively. 
As expected, we obtain exponential distributions with best-fitted exponent m/(m + 1) 
being 0.90(9), 0.95(2), and 0.97(5), corresponding to m — 10, 20, and 40, respectively. 

(iii) The case of m <C a < oo. As A;' is small, Ay can be approximated by 
Eq. (TIT]) . While k' is large, Ay can be described by the approximately logarithmic 
Taylor expansion. Therefore, there exists a tipping point k c in the degree distribution. 
As k! is smaller than k c , Eq. ([7j) can be simplified to 

A « = (^hY- < 14 > 

\m + 1/ 
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Figure 2. (Color online) Degree distributions of nodes of generated networks in case 
of a — !> oo. The solid lines are least-squares fits based on the form of Eq. (fT2|) . 

While k! is larger than k c , Eq. (JTj) reduces to 

A k , = ( a + 2 ~ 2 ^ \ (k c + a + 1 - 7) 7 " 1 x (k' + a + 1 - 7 )-<t- 1 >. (15) 
\ a + 1 - 7 / 

Combining aove two expressions, one can obtain the overall in-degree distribution ny 

nv = (7 - 1) ( a + 2 ~ 21 ) x (k c + a + 1 - t) 7 - 1 !^ + a + 1 - 7 )" 7 -(16) 

In Fig. |3l we plot the total degree distribution of the generated networks with parameters 
a = 200 for m = 10, 20, and 40, respectively. All the plots are right-skewed, in 
agreement with the theoretical prediction. 

4. Comparison with empirical data 

To examine present model, we utilize four empirical data from citation networks. 

(i) PNAS data [21], which contains 23,572 articles and 40,853 edges published by 
the proceedings of the National Academy of Sciences (PNAS) of the United States of 
America from 1998 to 2007. 
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Figure 3. (Color online) Degree distributions of nodes of generated networks in case 
of m <C a < oo. The solid lines are least-squares fits based on the form of Eq. (fT6|) , 

(ii) Hep-th data [25J, which comes from preprints posted on arxiv.org, and covers 
papers in the period from January 1992 to April 2003 (124 months). It contains 27, 770 
papers and 352, 807 edges. 

(iii) Hep-ph data [25], which comes from preprints posted on arxiv.org, and covers 
papers in the period from January 1992 to April 2003 (124 months). It contains 34, 546 
papers and 421, 578 edges. 

(iv) U.S. Patent data [26], which is maintained by the National Bureau of Economic 
Research. The data includes all citations made by patents granted between 1975 and 
1999, and contains 3, 774, 768 nodes and 16, 518, 948 edges. 

Figure H] shows the comparison of degree statistics of four citation networks with 
numerical results of generated networks. To gain values of mo, m and a, which refer 
to the number of initial isolate nodes, the number of references per paper and the 
attractiveness bias, respectively, we fit the empirical distribution based on Eq. (TIBjl . 
Although the empirical networks are different in nature, all the cumulative in-degree 
distribution follow a right-skewed decay, which shifts from an exponential to a power 
law. Table H] shows empirical data on the citation distribution of papers and assessed 
parameters mo, m and a by simulation, and one notices the good agreement. 
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Figure 4. (Color online) Comparison of empirical networks with simulation results of 
the present network model. Parameters of simulations for different values too = 3145, 
to = 2, a = 24; too = 631, to = 13, a = 12; too = 2117, m = 13, a = 18; 
too = 470,978, m = 5, a = 12.5 correspond to PNAS, Hep-th, Hep-ph and U.S. 
Patent data, respectively. 



5. Conclusion 

In summary, we have proposed a simple model for citation networks to explain the 
phenomenon of delayed recognition in the life of a article which usually begins lesser, 
rises to peak, and then diminishes. We suggested that the activity of a paper is the 
result of the competition of vigorousness and dormancy. The growth dynamics of the 
network is governed by the state transition. We found that the average number of 
references per paper m and the initial attractiveness of different fields a determine the 
topological structure of the generated network. If the value of a is selected appropriately 
as m + 2, the deactivation probability u(k) is a linear preferential one, which leads to 
a power-law degree distribution with the exponent 7 = 3. Whereas for a tends to oo, 
the vigorous nodes are selected to be deactivated with the uniform probability, and the 
model gives rise to an exponential degree distribution with the exponent only depending 
on m. Between the two regimes, the distribution gradually shifts from the exponential to 
the power law. To examine theoretical prediction, we compared the degree distribution 
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Table 1. Basic statistics of PNAS, Hcp-th, Hep-ph and U.S. Patent data. N, E and k 
denote the number of nodes, edges and average out-degree in four empirical networks, 
respectively. N', E' , mo, m and a are parameters for simulated networks. N' and 
E' denote the number of nodes and edges of the networks, mo represent the initial 
isolated nodes, m and a represent the average out-degree and the constant bias in the 
networks. 



Measures 
networks 


PNAS 


Hep-th 


Hep-ph 


U. S. Patent 


N 


23,572 


27,770 


34546 


3,774,768 


E 


40853 


352,807 


421578 


16,518,948 


k 


1.7 


12.7 


12.2 


4.4 


N' 


23,572 


27,770 


34546 


3,774,768 


E' 


40853 


352,807 


421578 


16,518,948 


m 


3145 


631 


2117 


470,978 


m 


2 


13 


13 


5 


a 


24 


12 


18 


12.5 



with empirical citation networks and noticed a good agreement. So the present model 
provides a new way to understand citation networks with age. 
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